Add crawled urls #2232

vhpx · 2025-02-11T19:21:39Z

Summary by CodeRabbit

New Features
- Introduced a new web crawling interface with a dedicated Crawl button so users can trigger crawls and view detailed results, including content displayed in multiple formats (Markdown, HTML) and related links.
Enhancements
- Simplified theme options by streamlining choices to “system”, “light”, and “dark” for a more intuitive UI.
- Improved overall reliability and error handling for crawling operations and system interactions.
- Updated dependency versions and optimized backend validations for better performance.

…during version bump

…I components

…ed options

coderabbitai · 2025-02-11T19:21:49Z

Warning

Rate limit exceeded

@vhpx has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 19 minutes and 6 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 26fa430 and 1e6e79f.

📒 Files selected for processing (1)

.github/workflows/check-and-bump-versions.yaml (1 hunks)

Walkthrough

This pull request introduces several modifications across the repository. In the workflow scripts, logic is enhanced to distinguish version changes in the package configuration. Database migrations add new tables while removing unused columns and ensuring proper constraints. Multiple projects update dependency versions and simplify ThemeProvider configurations. Enhancements to web crawler components include new UI elements, utility functions, and API endpoints for both initiating crawls and fetching their statuses. Additional type safety improvements and environment variable configurations are also introduced.

Changes

File(s)	Change Summary
`.github/workflows/check-and-bump-versions.yaml`	Added `check_package_json_changes` to verify package.json version-only changes and improved logging/error handling.
`apps/db/supabase/migrations/20250210132933_new_migration.sql`	Removed columns from `workspace_datasets` and created new tables (`workspace_crawlers`, `crawled_url_next_urls`, `crawled_urls`) with constraints and permissions.
`apps/external/package.json`, `apps/mira/package.json`, `apps/nova/package.json`, `apps/web/package.json`	Updated dependency versions (e.g. bumping `jotai`, `@tutur3u/eslint-config`, and `@tutur3u/types`).
`apps/mira/src/components/providers.tsx`, `apps/nova/src/app/[locale]/(dashboard)/[wsId]/layout.tsx`, `apps/rewise/src/components/providers.tsx`, `apps/web/src/components/providers.tsx`, (removed: `apps/nova/src/components/theme-provider.tsx`)	Simplified ThemeProvider themes to `['system', 'light', 'dark']`, updated default theme and imports, and removed the custom theme provider.
`apps/web/src/app/[locale]/(dashboard)/[wsId]/crawlers/(default)/page.tsx`, `apps/web/src/app/[locale]/(dashboard)/[wsId]/crawlers/[crawlerId]/crawl-button.tsx`, `apps/web/src/app/[locale]/(dashboard)/[wsId]/crawlers/[crawlerId]/crawler-content.tsx`, `apps/web/src/app/[locale]/(dashboard)/[wsId]/crawlers/[crawlerId]/page.tsx`, `apps/web/src/app/[locale]/(dashboard)/[wsId]/crawlers/[crawlerId]/utils.ts`, `apps/web/src/app/[locale]/(dashboard)/[wsId]/crawlers/columns.tsx`	Modified data handling in crawler pages; added new components (`CrawlButton`, `CrawlerContent`); implemented utility functions for Markdown/HTML formatting; and updated URL generation logic in tables.
`apps/web/src/app/api/v1/workspaces/[wsId]/crawl/route.ts`, `apps/web/src/app/api/v1/workspaces/[wsId]/crawlers/status/route.ts`	Introduced new API endpoints: a POST route for initiating crawls with validations and error handling, and a GET route for retrieving crawler status from the database.
`packages/supabase/src/next/server.ts`, `packages/types/src/supabase.ts`	Improved type safety by updating the cookie handler signature and added new type definitions for `crawled_url_next_urls` and `crawled_urls`.
`apps/web/.env.example`, `turbo.json`	Added new environment variables (`SCRAPER_URL`, `PROXY_API_KEY`, `NEXT_PUBLIC_PROXY_API_KEY`) to support infrastructure and crawling functionalities.

Sequence Diagram(s)

sequenceDiagram
    participant U as User
    participant CB as CrawlButton
    participant API as Crawl API Endpoint
    participant SCM as SCRAPER Service
    participant DB as Database

    U->>CB: Clicks "Crawl" button
    CB->>API: Sends POST request (/api/v1/workspaces/{wsId}/crawl)
    API->>API: Validate API key/session and input
    API->>DB: Query keys, secrets, and check URL existence
    API->>SCM: Request to crawl the URL
    SCM-->>API: Return crawled data
    API->>DB: Insert crawled results
    API-->>CB: Respond with success
    CB->>U: Trigger onSuccess callback (updates UI)

sequenceDiagram
    participant CC as CrawlerContent
    participant API as Status API Endpoint
    participant DB as Database

    loop Every 5 seconds
        CC->>API: Sends GET request for crawl status
        API->>DB: Query crawled_urls and crawled_url_next_urls
        DB-->>API: Return crawler status data
        API-->>CC: Respond with current status and related URLs
    end

Suggested labels

enhancement, help wanted

Poem

Hop, hop, I’m a coding bunny,
Skipping through schemas and workflows so sunny.
New crawlers and themes make my heart race,
Dependencies jolt a smile on my face.
I nibble on bugs and munch on code,
Celebrating changes on my joyful road.
🐇✨ Happy hops in every line!

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 11

🔭 Outside diff range comments (2)

apps/web/src/app/[locale]/(dashboard)/[wsId]/crawlers/(default)/page.tsx (1)
44-54: 💡 Verification agent
🧩 Analysis chain

Verify the impact of removing data transformation.

The data transformation step that added the href property has been removed. Please ensure that this property is not required by the table columns or any other components.

Run the following script to check if any code relies on the href property:

🏁 Script executed:
#!/bin/bash
# Description: Check for usage of the `href` property in table columns and related components.

# Test: Search for `href` usage in column definitions
rg -A 5 'href' apps/web/src/app/\[locale\]/$dashboard$/\[wsId\]/crawlers/columns.ts

# Test: Search for `href` usage in components that might use the table data
rg -A 5 'href' apps/web/src/app/\[locale\]/$dashboard$/\[wsId\]/crawlers/
Length of output: 3378
Issue Identified: Missing href Property for Row Actions

The RowActions component in the crawlers module expects a href property (accessed as row.original.href). Our search confirms that while one cell in columns.tsx computes a URL using row.getValue('id'), the RowActions component still relies on a href property that would have been added in the removed data transformation. This discrepancy can cause the view button (or any link relying on the href) in the row actions not to render or work as intended.

RowActions Usage: In apps/web/src/app/[locale]/(dashboard)/[wsId]/crawlers/row-actions.tsx, the link is conditionally rendered only if href is provided.

Computed URL: Although columns.tsx computes a URL for one of its columns, it passes row.original.href to RowActions, which will be undefined after the transformation removal.

Please verify whether this behavior is acceptable or if the code should either (a) revert the removal to ensure the href property is added to each data row or (b) update the RowActions usage to compute the href similar to the approach in the column definition.
apps/db/supabase/migrations/20250210132933_new_migration.sql (1)
45-103: ⚠️ Potential issue

Review security implications of broad permissions.

The migration grants full permissions (including truncate) to anonymous users. Consider restricting permissions based on actual requirements.

Consider removing unnecessary permissions, especially for anonymous users:
-grant truncate on table "public"."workspace_crawlers" to "anon";
-grant trigger on table "public"."workspace_crawlers" to "anon";
-grant references on table "public"."workspace_crawlers" to "anon";

🧹 Nitpick comments (12)

apps/web/src/app/api/v1/workspaces/[wsId]/crawl/route.ts (3)

11-28: Validate the URL input more thoroughly
Currently, the code only checks for the existence of url, but doesn't validate whether it's a valid or properly formatted URL. Consider using a stricter validation step (e.g., checking for valid protocols) to prevent passing malformed URLs to the crawler.

86-92: Use structured logging for errors
The code logs errors using console.log. In production, a structured logging approach or a centralized logging service is preferable to simplify troubleshooting and to preserve sensitive details securely.

94-155: Check for missing or invalid SCRAPER_URL in environment
The code fetches process.env.SCRAPER_URL without verifying its presence or validity. If SCRAPER_URL is not set, it may produce an unhelpful runtime error. Consider returning a 500 status with a clear message if this environment variable is missing.

apps/web/src/app/[locale]/(dashboard)/[wsId]/crawlers/[crawlerId]/crawl-button.tsx (2)

6-14: Ensure consistent error handling or user feedback
The component logs errors to the console but does not provide user-visible feedback. For a better user experience, consider adding a notification/toast or UI indicator if crawling fails.

17-46: Handle non-OK responses more robustly
Currently, the code simply logs an error if res.ok is false and then returns. If the server returns a 400 or 403, it might be helpful to parse the JSON body to show a user-friendly message explaining why the crawl failed.

apps/web/src/app/[locale]/(dashboard)/[wsId]/crawlers/[crawlerId]/utils.ts (1)

20-53: Handle self-closing tags and complex HTML structures in formatHTML
The current approach incrementally adjusts depth and relies on </> detection. Self-closing tags, inline elements, or malformed HTML can disrupt the formatted output. Consider leveraging a proper HTML parser for a more reliable formatting approach if your use case demands it.
apps/web/src/app/api/v1/workspaces/[wsId]/crawlers/status/route.ts (1)
15-34: Enhance error handling with detailed messages.

Include the error details in the response for better debugging while ensuring no sensitive information is exposed.
   if (crawledUrlError) {
     return NextResponse.json(
-      { message: 'Error fetching crawled URL' },
+      {
+        message: 'Error fetching crawled URL',
+        error: crawledUrlError.message || 'Unknown error'
+      },
       { status: 500 }
     );
   }
apps/web/src/app/[locale]/(dashboard)/[wsId]/crawlers/(default)/page.tsx (1)
59-101: Implement search functionality.

The getData function accepts a search query parameter q but doesn't utilize it in the query builder. Consider implementing search functionality to filter the crawlers based on the query.

Apply this diff to add search functionality:
 const queryBuilder = supabase
   .from('workspace_crawlers')
   .select('*')
   .order('created_at', { ascending: true })
   .eq('ws_id', wsId);

+if (q) {
+  queryBuilder.ilike('url', `%${q}%`);
+}

 if (page && pageSize) {
apps/web/src/app/[locale]/(dashboard)/[wsId]/crawlers/[crawlerId]/crawler-content.tsx (1)
142-177: Enhance URL display with truncation and copy functionality.

Long URLs may overflow the container, and there's no easy way to copy them.

Apply this diff to improve the URL display:
+'use client';

 import CrawlButton from './crawl-button';
 import { formatHTML, unescapeMarkdownString } from './utils';
 import { MemoizedReactMarkdown } from '@/components/markdown';
+import { Copy } from 'lucide-react';
+import { toast } from 'sonner';

// ...

                              <span
                                className={
                                  relatedUrl.skipped
                                    ? 'text-muted-foreground'
                                    : ''
                                }
                              >
-                                {relatedUrl.url}
+                                <div className="flex items-center gap-2">
+                                  <span className="truncate max-w-[500px]" title={relatedUrl.url}>
+                                    {relatedUrl.url}
+                                  </span>
+                                  <button
+                                    onClick={() => {
+                                      navigator.clipboard.writeText(relatedUrl.url);
+                                      toast.success('URL copied to clipboard');
+                                    }}
+                                    className="p-1 hover:bg-muted rounded-md"
+                                  >
+                                    <Copy className="h-4 w-4" />
+                                  </button>
+                                </div>
                              </span>
apps/db/supabase/migrations/20250210132933_new_migration.sql (1)
129-134: Consider adding indexes for URL lookups.

The crawled_urls and crawled_url_next_urls tables will likely be queried by URL frequently. Consider adding indexes to improve query performance.
create table "public"."crawled_urls" (
    "id" uuid not null default gen_random_uuid(),
    "url" text not null,
    "html" text,
    "markdown" text,
    "created_at" timestamp with time zone not null default now()
);
+CREATE INDEX idx_crawled_urls_url ON public.crawled_urls USING btree (url);

create table "public"."crawled_url_next_urls" (
    "origin_id" uuid not null default gen_random_uuid(),
    "url" text not null,
    "skipped" boolean not null,
    "created_at" timestamp with time zone not null default now()
);
+CREATE INDEX idx_crawled_url_next_urls_url ON public.crawled_url_next_urls USING btree (url);
Also applies to: 136-142
.github/workflows/check-and-bump-versions.yaml (2)
296-310: Improve code maintainability.

While the logic is correct, consider these improvements for better maintainability:

Extract magic number "1" as a constant

Store "package.json" as a variable

Reduce nesting by early returns

Apply this diff to improve the code:
+                    # Constants
+                    PACKAGE_JSON="package.json"
+                    SINGLE_FILE=1
+
                     # If only package.json changed, check if it's only version changes
-                    if [ "$(echo "$SOURCE_FILES_CHANGED" | wc -l)" -eq 1 ] && [ "$(echo "$SOURCE_FILES_CHANGED" | grep "package.json$")" ]; then
-                      if check_package_json_changes; then
-                        echo "Only version field changed in package.json, skipping version bump"
-                        SHOULD_BUMP=false
-                      else
-                        echo "Package.json has meaningful changes"
-                        SHOULD_BUMP=true
-                      fi
-                    else
-                      echo "Source files changed:"
-                      echo "$SOURCE_FILES_CHANGED"
-                      SHOULD_BUMP=true
-                    fi
+                    # Set default to true for source changes
+                    SHOULD_BUMP=true
+                    
+                    # Check if only package.json changed
+                    if [ "$(echo "$SOURCE_FILES_CHANGED" | wc -l)" -eq $SINGLE_FILE ] && \
+                       [ "$(echo "$SOURCE_FILES_CHANGED" | grep "$PACKAGE_JSON$")" ] && \
+                       check_package_json_changes; then
+                        echo "Only version field changed in $PACKAGE_JSON, skipping version bump"
+                        SHOULD_BUMP=false
+                        return
+                    fi
+                    
+                    echo "Source files changed:"
+                    echo "$SOURCE_FILES_CHANGED"
271-271: Remove trailing spaces.

Remove trailing spaces to maintain consistent code style.

Apply this diff to fix the trailing spaces:
-                DIFF=$(git diff HEAD~1 HEAD -- package.json)
-                
+                DIFF=$(git diff HEAD~1 HEAD -- package.json)
+
                 # Count number of changed lines that aren't the version field
-                NON_VERSION_CHANGES=$(echo "$DIFF" | grep -v '"version":' | grep '^[+-]' | wc -l)
-                
+                NON_VERSION_CHANGES=$(echo "$DIFF" | grep -v '"version":' | grep '^[+-]' | wc -l)
+
Also applies to: 274-274

🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 271-271: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between af884ea and 26fa430.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (23)

.github/workflows/check-and-bump-versions.yaml (2 hunks)
apps/db/supabase/migrations/20250210132933_new_migration.sql (1 hunks)
apps/external/package.json (1 hunks)
apps/mira/package.json (1 hunks)
apps/mira/src/components/providers.tsx (1 hunks)
apps/nova/package.json (1 hunks)
apps/nova/src/app/[locale]/(dashboard)/[wsId]/layout.tsx (2 hunks)
apps/nova/src/components/theme-provider.tsx (0 hunks)
apps/rewise/src/components/providers.tsx (1 hunks)
apps/web/.env.example (1 hunks)
apps/web/package.json (1 hunks)
apps/web/src/app/[locale]/(dashboard)/[wsId]/crawlers/(default)/page.tsx (1 hunks)
apps/web/src/app/[locale]/(dashboard)/[wsId]/crawlers/[crawlerId]/crawl-button.tsx (1 hunks)
apps/web/src/app/[locale]/(dashboard)/[wsId]/crawlers/[crawlerId]/crawler-content.tsx (1 hunks)
apps/web/src/app/[locale]/(dashboard)/[wsId]/crawlers/[crawlerId]/page.tsx (2 hunks)
apps/web/src/app/[locale]/(dashboard)/[wsId]/crawlers/[crawlerId]/utils.ts (1 hunks)
apps/web/src/app/[locale]/(dashboard)/[wsId]/crawlers/columns.tsx (1 hunks)
apps/web/src/app/api/v1/workspaces/[wsId]/crawl/route.ts (1 hunks)
apps/web/src/app/api/v1/workspaces/[wsId]/crawlers/status/route.ts (1 hunks)
apps/web/src/components/providers.tsx (1 hunks)
packages/supabase/src/next/server.ts (1 hunks)
packages/types/src/supabase.ts (1 hunks)
turbo.json (1 hunks)

💤 Files with no reviewable changes (1)

apps/nova/src/components/theme-provider.tsx

✅ Files skipped from review due to trivial changes (2)

apps/external/package.json
apps/web/.env.example

🧰 Additional context used

🪛 YAMLlint (1.35.1)

.github/workflows/check-and-bump-versions.yaml

[error] 271-271: trailing spaces

(trailing-spaces)

[error] 274-274: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: Analyze (javascript-typescript)

🔇 Additional comments (16)

apps/mira/src/components/providers.tsx (1)

9-9: LGTM! Theme options simplified.

The reduction to three core themes ('system', 'light', 'dark') aligns with the broader theme simplification across the application.

apps/web/src/components/providers.tsx (1)

9-9: LGTM! Consistent theme implementation.

The theme configuration matches the implementation in other applications, maintaining consistency across the codebase.

apps/rewise/src/components/providers.tsx (1)

10-10: LGTM! Theme configuration properly implemented.

The theme array matches other applications while maintaining its specific dark theme default.
apps/nova/src/app/[locale]/(dashboard)/[wsId]/layout.tsx (1)

4-4: LGTM! Theme provider properly migrated.

The migration from custom theme provider to direct usage of next-themes is properly implemented with consistent configuration.

Let's verify the removal of the custom theme provider:

Also applies to: 21-27
❌ Verification inconclusive

🏁 Script executed:
#!/bin/bash
# Check if the old theme provider file has been removed
rg -l "theme-provider" apps/nova/src/components/
Length of output: 48

LGTM – Theme provider migration verified.

The search for "theme-provider" in the components directory returned no results, suggesting that the old custom theme provider has been removed. The import in apps/nova/src/app/[locale]/(dashboard)/[wsId]/layout.tsx now correctly uses next-themes with configuration matching elsewhere in the codebase.

Please perform a manual verification for any residual references if needed.
apps/web/src/app/api/v1/workspaces/[wsId]/crawl/route.ts (1)

5-10: Confirm the need for Promise in params
The interface Params uses params: Promise<{ wsId: string }>;, which is unusual. Typically, Next.js handlers expect plain objects rather than Promises. Verify that returning a Promise for params is intentional, as it may complicate or delay extracting wsId.

apps/web/src/app/[locale]/(dashboard)/[wsId]/crawlers/[crawlerId]/utils.ts (1)

1-18: Confirm edge-case handling for escaped sequences
unescapeMarkdownString replaces well-known escape sequences, but edge cases like \\\n or nested escapes may not be addressed. Verify if more robust or specialized handling is required for your use case.

apps/web/src/app/api/v1/workspaces/[wsId]/crawlers/status/route.ts (1)

49-53: LGTM!

The response format is clean and handles null values appropriately.
apps/web/src/app/[locale]/(dashboard)/[wsId]/crawlers/[crawlerId]/page.tsx (2)

39-54: LGTM!

The component structure is clean and props are passed correctly.

13-38: 🛠️ Refactor suggestion

Optimize data fetching and enhance error handling.

Run queries in parallel to improve performance

Add error handling for subsequent queries
   const { wsId, crawlerId } = await params;
   const supabase = await createClient();

-  const { data: crawler } = await supabase
-    .from('workspace_crawlers')
-    .select('*')
-    .eq('id', crawlerId)
-    .single();
+  const [crawlerResult, crawledUrlResult] = await Promise.all([
+    supabase
+      .from('workspace_crawlers')
+      .select('*')
+      .eq('id', crawlerId)
+      .single(),
+    supabase
+      .from('crawled_urls')
+      .select('*')
+      .eq('url', crawler.url)
+      .maybeSingle(),
+  ]);

-  if (!crawler) notFound();
+  if (crawlerResult.error) throw new Error('Failed to fetch crawler');
+  if (!crawlerResult.data) notFound();
+  const crawler = crawlerResult.data;

-  const { data: crawledUrl } = await supabase
-    .from('crawled_urls')
-    .select('*')
-    .eq('url', crawler.url)
-    .maybeSingle();
+  if (crawledUrlResult.error) throw new Error('Failed to fetch crawled URL');
+  const crawledUrl = crawledUrlResult.data;
Likely an incorrect or invalid review comment.
packages/supabase/src/next/server.ts (1)

4-4: LGTM! Good type safety improvement.

Replacing any with ReadonlyRequestCookies improves type safety and makes the code more maintainable.

Also applies to: 7-7

turbo.json (1)

27-27: LGTM! Addition of SCRAPER_URL environment variable.

The addition of the SCRAPER_URL environment variable aligns well with the PR's objective of adding crawled URLs functionality.
apps/nova/package.json (1)

111-111: Verify jotai version update compatibility.

The jotai package is being updated from ^2.11.3 to ^2.12.0. Let's verify this version exists and check for any breaking changes.
✅ Verification successful

🌐 Web query:
What's new in jotai version 2.12.0? Are there any breaking changes?
💡 Result:

Jotai's release history up to version 2.11.3 shows significant internal improvements but no documentation exists in these sources about version 2.12.0 specifically[1][2]. Based on the available information:

Version Patterns:

Recent releases (v2.11.x) focused on internal architecture changes and stability fixes rather than public API updates[2].

The project follows semantic versioning, so minor versions (e.g., 2.X) typically avoid breaking changes unless explicitly noted[1][4].

Breaking Changes:
Major breaking changes were introduced in earlier v2 releases:

Async atoms now require explicit promise handling (await get(asyncAtom) instead of automatic resolution)[1][3]

Removed deprecated APIs like initialValues and scope props in favor of store-based patterns[1]

TypeScript type updates for writable atoms (WritableAtom<Value, Args, Result>)[1]

For v2.12.x specifically, check Jotai's official GitHub Releases page or npm registry entries (not included in these sources), as minor patches after v2.11.x likely address edge cases rather than introduce breaking changes[2][4].

Citations:

1: https://jotai.org/docs/guides/migrating-to-v2-api

2: https://github.com/pmndrs/jotai/releases

3: RFC: New Jotai v2 API pmndrs/jotai#1514

4: https://jotai.org/docs

5: https://jotai.org/docs/core/atom

6: https://webprogrammingforappsandservices.sdds.ca/Managing-Application-State/introducing-jotai

7: https://github.com/jotaijs/jotai-signal/blob/main/CHANGELOG.md

8: https://github.com/jotaijs/jotai-cache/blob/main/CHANGELOG.md

9: https://jotai.org

10: https://www.npmjs.com/package/react-native-mmkv/v/2.12.0

Jotai version update verified as compatible

Based on the latest web query, the upgrade from ^2.11.3 to ^2.12.0 is a minor version bump following semantic versioning. There is no official documentation indicating breaking changes for 2.12.0, and the expected focus on stability fixes/internal improvements suggests compatibility.

Confirmed that the update adheres to semantic versioning.

No unexpected breaking changes are reported for 2.12.0.
apps/mira/package.json (1)

111-111: Same jotai version update as in nova package.

This change mirrors the jotai version update in the nova package, maintaining consistency across the project.

apps/db/supabase/migrations/20250210132933_new_migration.sql (2)

1-8: Verify data migration plan for removed columns.

The removal of html_ids, type, and url columns from workspace_datasets could affect existing data. Ensure there's a data migration plan in place.

10-17: LGTM! Well-structured workspace_crawlers table.

The new table includes all necessary fields and appropriate constraints for managing crawled content.

apps/web/package.json (1)

119-119: Same jotai version update as in other packages.

This change completes the coordinated jotai version update across all packages.

apps/web/src/app/api/v1/workspaces/[wsId]/crawl/route.ts

apps/web/src/app/api/v1/workspaces/[wsId]/crawlers/status/route.ts

apps/web/src/app/[locale]/(dashboard)/[wsId]/crawlers/columns.tsx

apps/web/src/app/[locale]/(dashboard)/[wsId]/crawlers/[crawlerId]/crawler-content.tsx

packages/types/src/supabase.ts

.github/workflows/check-and-bump-versions.yaml

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

vhpx added 7 commits February 11, 2025 19:09

chore: add function to check for non-version changes in package.json …

2d3898b

…during version bump

chore: update jotai and related dependencies to latest versions

caac156

refactor(crawlers): simplify data mapping and update link generation

bd98781

chore: remove unused ThemeProvider component

ee689e2

refactor(providers): simplify theme options in ThemeProvider

f151587

feat(crawlers): add crawling functionality with API integration and U…

8b37817

…I components

refactor(layout): update ThemeProvider to use next-themes with enhanc…

26fa430

…ed options

vhpx self-assigned this Feb 11, 2025

coderabbitai bot reviewed Feb 11, 2025

View reviewed changes

Skora and others added 2 commits February 12, 2025 02:30

Update .github/workflows/check-and-bump-versions.yaml

b72ee13

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Merge branch 'main' into feat/add-crawled-urls

1e6e79f

vhpx merged commit 8c36f0b into main Feb 11, 2025
10 checks passed

vhpx deleted the feat/add-crawled-urls branch February 11, 2025 19:32

coderabbitai bot mentioned this pull request Feb 24, 2025

Fix/nova projects #2250

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add crawled urls #2232

Add crawled urls #2232

Uh oh!

vhpx commented Feb 11, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 11, 2025 •

edited

Loading

Rate limit exceeded

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add crawled urls #2232

Add crawled urls #2232

Uh oh!

Conversation

vhpx commented Feb 11, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Suggested labels

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vhpx commented Feb 11, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 11, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)